Question 1:

AAPL <- read.csv("AAPL.csv", header = T) #reading each file and using View() to view it
ADBE <- read.csv("ADBE.csv", header = T)
AMZN <- read.csv("AMZN.csv", header = T)
BABA <- read.csv("BABA.csv", header = T)
BIDU <- read.csv("BIDU.csv", header = T)
FB <- read.csv("FB.csv", header = T)
MSFT <- read.csv("MSFT.csv", header = T)
PCTY <- read.csv("PCTY.csv", header = T)
TWTR <- read.csv("TWTR.csv", header = T)
ZM <- read.csv("ZM.csv", header = T)

Upon viewing each file from the environment, all of the files consist of the same type of data. The data is associated with stock prices of 10 different companies in different time periods. All of the stock data includes opening, closing, adjusted closing, maximum (high) and minimum (low) stock prices as well as the total volume of traded stocks in a given day. Although all of these files convey the same information and are in the same format, we can see that the data provided is not for the same days for all 10 companies. The data for some companies is more recent than others. Upon running an is.na() test in the console, it can be concluded that there are no missing values in any of the files, as all of it returned FALSE for the rows that were read.

Question 2:

For our simple analysis, we are going to use the summary() function to find out the minimum, maximum, and average values for the HIGH column for three of the companies (Amazon, Facebook, and Microsoft) ONLY for the year 2020. The goal is to find out which of these three companies, on average, had the highest prices in the year 2020. The reason for doing the analysis in the year 2020 is because the Covid-19 pandemic greatly impacted the global economy in 2020, so it would be interesting to see which of these three companies had the highest share prices in that year.

summary(AMZN[5696:5880, "High"]) #using summary() instead of individual mean, median functions to display all the information at once
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1759    1996    2437    2529    3084    3552
summary(FB[1918:2102, "High"])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   148.2   203.6   222.4   223.6   245.2   304.7
summary(MSFT[8522:8706, "High"])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   140.6   170.0   185.9   187.6   207.6   232.9

From the results above, we can see that in the year 2020, Amazon had the highest share price of all three companies ($3522), whereas Microsoft had the lowest of the highest share prices ($232.9). Secondly, Amazon also had the highest share price on average on any given day, as their mean highest share price in 2020 is $2529. Again, Microsoft had the lowest of the highest share prices on average, with $187.6.

Now, let’s look at the summary of the volume of shares traded by these three companies in 2020.

summary(AMZN[5696:5880, "Volume"]) #again, using summary() to obtain the necessary information
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##  2088000  3529300  4666300  5111896  6128300 15567300
summary(FB[1918:2102, "Volume"])
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 10015300 16110700 21428300 23929403 29183400 76343900
summary(MSFT[8522:8706, "Volume"])
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
## 17958900 28774200 34754500 41129223 50479600 97073600

Here, we find something very interesting. Despite Amazon having the highest share prices, they traded the lowest number of shares on average in 2020. As a matter of fact, the order is completely reversed for this part of the data. On average, Microsoft traded the most number of shares per day (41,129,223), followed by Facebook’s 23,929,403 shares per day and Amazon’s 5,111,896 shares per day.

Question 3:

For our first figure, we will use the quantmod and plotly packages to construct a candlestick plot for our stock prices for Amazon using the data for the year of 2020, as calculated in the earlier problem.

r = getOption("repos")
r["CRAN"] = "http://cran.us.r-project.org" #need to set a cran mirror for this package, otherwise R gives a trying to use cran without setting a mirror error
options(repos = r)
install.packages("quantmod") #installing and loading the quantmod and plotly packages
## Installing package into 'C:/Users/naufa/AppData/Local/R/win-library/4.3'
## (as 'lib' is unspecified)
## package 'quantmod' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\naufa\AppData\Local\Temp\Rtmpam42ir\downloaded_packages
install.packages("plotly")
## Installing package into 'C:/Users/naufa/AppData/Local/R/win-library/4.3'
## (as 'lib' is unspecified)
## package 'plotly' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\naufa\AppData\Local\Temp\Rtmpam42ir\downloaded_packages
library(quantmod) #loading them into our file
## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: TTR
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(plotly)
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
AMZN1 <- tail(AMZN, 185) #using the tail function to analyze the last 185 rows of the Amazon dataset only
candlestick <- AMZN1 %>% plot_ly(x = ~Date, type = "candlestick",
        open = ~Open, close = ~Close,
        high = ~High, low = ~Low)
candlestick <- candlestick %>% layout(title = "Amazon Stock Chart 2020") 

candlestick

From the chart, we can observe that the “candles” for Amazon’s stock prices were quite short in the beginning of 2020. This means that for the first few months of 2020, price did not fluctuate too much and the company was experiencing a momentum loss. For the rest of the year, we notice an upward trend in stock price. One more thing that stands out in the figure is the very long tail of the candle on September 4, 2020. The candle itself is very short, which reflects the indecision of buyers and sellers in the market. The price swung up and swung down, but closed at about the same amount it opened, leading buyers and sellers nowhere.

Next, we will create a scatter plot for the highest stock prices within the time frame used in Question 2 for Microsoft.

MSFTplot <- MSFT[8522:8706, ] # Day 1 is 1/2/2020 and Day 185 is 9/24/2020
plot(1:185, MSFTplot$High, main = "Microsoft Stock Peaks, 2020",
     xlab = "Days", ylab = "High", pch = 19, frame = FALSE) #looks like a dragon :)

If we look at the scatter plot above, we can see that Microsoft’s stock shows an upward trend for the first couple months of the year. However, around February the stock takes a plummet and keeps going down for the month of March. In April, the stock picks back up and keeps going up for the rest of the year. During September, however, we again see a slight dip in the stock, and it would have been interesting to see what happened after that but we do not have the data for it.

For our third figure, we will create a boxplot for the volume of shares sold per day by Facebook during 2020.

FBplot <- FB[1918:2102, ] #just using the base R boxplot here
boxplot(FBplot$Volume, horizontal = TRUE, xlab="Volume", main = "Facebook, Inc 2020 Share Volume Boxplot") 

From the boxplot, we can see that the median volume of shares sold for Facebook in 2020 were around 21 million. Since this boxplot is modified, it gives us information about the outliers in the data. Even though the maximum volume of shares Facebook sold in a day in 2020 is 76 million, the boxplot reveals that there were only a handful of such abnormally large numbers. The modified maximum appears to be around 48-49 million, and there were only six days when Facebook sold 50 million or more shares during the year.

Question 4:

Figure 1

drawFigure1 <- function(filename, r, n){ # Using the same plot_ly() as used before, just replacing Amazon's values with the three parameters of the function
candlestick <- filename[r:n, ] %>% plot_ly(x = ~Date, type = "candlestick",
        open = ~Open, close = ~Close,
        high = ~High, low = ~Low)
candlestick <- candlestick %>% layout(title = "Stock Chart")
candlestick
}

drawFigure1(BABA, 10, 20)

Figure 2

drawFigure2 <- function(filename, r, n){ # just need to replace the particular parameters used in the earlier code with the variables of the function, r and n.
plot(r:n, filename[r:n, "High"], main = "Stock Peaks, 2020",
     xlab = "Days", ylab = "High", pch = 19, frame = FALSE)
}

drawFigure2(TWTR, 15, 20)

Figure 3

drawFigure3 <- function(filename, r, n){ #boxplot visually represents the median, mode, range, max, and min values, NOT the mean
boxplot(filename[r:n, "Volume"], horizontal = TRUE, xlab="Volume", main = "2020 Share Volume Boxplot") 
}

drawFigure3(BIDU, 1, 1515)

Question 5:

Added comments.

Question 6:

For some additional analysis, we will create a scatter plot of Zoom’s adjusted closing share prices from the week going into 2020.

ZMplot <- ZM[158:199, ] # Day 1 is 12/2/2019 and Day 42 is 1/31/2020
plot(1:42, ZMplot$Adj.Close, main = "Zoom Stock, Dec 2019 - Jan 2020", 
     xlab = "Days", ylab = "Adjusted Closing Price", pch = 19, frame = FALSE) #using adjusted closing price here instead of the highs 

#another dragon!

If we look at the graph above, we can see that in the beginning of December, 2019, Zoom share prices were on a decline. However, after the 10th day, there was a sharp spike in the share price and within a span of 20 days, it crossed $75. This is plausible because the first Covid-19 outbreak happened in Wuhan in December 2019, and on January 23, 2020, China imposed the first lockdown. However, even before the lockdown was imposed, people were frightened by the outbreak and choosing to stay at home, which could have led to the increase consumption of Zoom software and the resulting hike in share price.

Lastly, we will look at a time series of Adobe’s stock for 4 months, starting from November 2019 to February 2020.

stock <- ADBE[8375:8455, ] #an actual time series plot which shows up when we look up stock prices for companies on google
fig <- plot_ly(stock, type = 'scatter', mode = 'lines')%>% #the lines mode connects the dots of a scatterplot and makes a time series line
  add_trace(x = ~Date, y = ~High)%>%
  layout(showlegend = F)
fig <- fig %>%
  layout(
         xaxis = list(zerolinecolor = '#ffff',
                      zerolinewidth = 2,
                      gridcolor = 'ffff'),
         yaxis = list(zerolinecolor = '#ffff',
                      zerolinewidth = 2,
                      gridcolor = 'ffff'),
         plot_bgcolor='#e5ecf6', width = 900)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
fig
## Warning: Can't display both discrete & non-discrete data on same axis

As evident from the figure above, Adobe’s stock was on a constant high during these four months. Again, as the Covid-19 pandemic emerged, the software industries and companies started doing well because a lot of work was shifted to computers, and people were spending more time either working, or learning skills that might involve softwares such as Adobe.